This notebook analyzes tweets by wizkidfc over a 1 week period, the dataset contains 2076 tweets which is an excel file with two sheets, the first is the tweets and info about each tweet, while the second contains info about the tweep for each tweet.

We begin by importing the necessary libraries and packages

In the cell below, the data in the tweets sheet is imported and assigned to variable wizkid, the cells following this one are just helping to acquire more info about the features of the dataset and help us get more ideas about the dataset in general

The unnamed features are not required, so they are removed.

We're not really going to be working with pictures or videos, so its better to just drop the media url and urls column.

To check for the total tweets in that period

To begin our Exploratory Data Analysis(EDA). We're going to first check for the most retweeted tweets, in the period.

It's better to create a dataframe, so it can be visualized more easily

Yeah, it's better this way. It's better when we visualize it as a chart though

Going further, another thing I'd like to check is the popularity of the different types of tweet clients

The cell below checks for the value count of each tweet client. From the output, we can see that Android is the most popular choice of smartphone among wizkid fans, It's possible you'd have thought it was going to be iphone, but android wins again .

One thing that surprises me most in the types of tweet client is the wizkid retweet bot, the bot helps to retweet posts, shout-out to whoever created this bot, you're also using your skills to help the fandom.

To visualize it better, I'm gonna be using a bar chart once again

The next thing, I'm going to be doing is analyze tweet sentiments.

Just a side note, i didn't know much about Natural Language Processing before starting this analysis, so I had to do some reading and also take a quick course on it, I'm still not conversant with it but I kind of know my way around it now.

For the sentiment analysis, I used textblob. Textblob is the python library for processing textual data. Textblob is a high level library built on top of NLTK library.

The function below helps to get the subjectivity and polarity of each tweet. Subjectivity here refers to tweets that generally refer to personal opinion, emotion or judgement whereas objective refers to factual information. Subjectivity is a float which lies in the range of [0,1]

Polarity is also a flot which lies in the range [-1,1] where 1 means positive satement and -1 means a negative statement.

After getting the polarity of each tweet, It makes more sense to categorize them as positive, negative or neutral.

Tweets with a score less than zero are negative, tweets with a score of zero are neutral while tweets with a score more than zero are positive tweets.

The next cell checks for the value count of each tweet sentiment, tweets with positive sentiments came out on top. 😊🎉🎉 You're happy right? Well, I am too 😂. Let's just try to keep tweets with negative sentiments down 🤞🤞.

As always, every analysis we do is better visualized with a chart, a bar chart is also the best for this.

The next analysis we are going to be doing is to plot a wordcloud which is based on the most popular words in the tweet column. Before we can plot the wordcloud though, we have to do some preprocessing.

The cell below convert all the tweets into lowercase letters.

The punctuations are also cleaned, thereby reducing the unnecessary noise from the dataset.

After that, we also remove the repeating characters from the words.

This next cell cleans URLs in all the tweets

This next cell cleans numbers from the tweets

This next cell tokenizes the cleaned tweets, tokenization helps to separate the sentences into their individual words

At last, then we performed stemming(reducing the words th their derived stems) and lemmatization(reducing the derived words to their root form known as lemma)

Phew 😪. At long last, the preprocessing part is done.

So, this is the part we've been preprocessing for 🎉🎉

So, here is the wordcloud(I almost typed soundcloud 😂). As we can see from the chart, wizkid is the most popular word here, It has to be (all this is about him). The next is davido, this was kinda expected as they get compared in almost every tweet(which isn't at all necessary). I can see burna too, but boy is missing (another insight, people usually remove the boy and just call him burna). Next is the grammys, this has been the major subject in recent tweets. I feel bad he didn't win, we can go again next time 🤞.

I can also see love, wizkidfc is preaching love, that's nice, really nice, 😂

But then, I can see true, which is almost the same size as love, wizkidfc is preaching true love, that's lovely 😂 (I hope you understand what I did here).

So, here is a wordcloud of tweets with positive sentiment, it categorizes some words wrongly but we can still see the words which are categorized correctly.

So, here is a wordcloud of tweets with negative sentiment, it categorizes some words wrongly but we can still see the words which are categorized correctly.

The next analysis I wanto to do is going to be based on the time of the tweets, there a lot of insights that can be gotten from here.

I'm going to convert the Created At column into a datettime column, so it can be easy to work with.

From the Created At column, i'll be creating other columns, which are the day column and the hour column

After this, we'll be mapping each day with their respective day of the week

After checking the days of the week, I realized all the 2076 tweets were all done on Saturday 😂, that's a lot for one day really, too much.

Because of this, there's not much pattern that can be genrated as we don't have too many instances to work with.

When I have time for this again, I'm gonna try to scrape more tweets that will cover a longer duration and make more analysis from that, more insight can be generated then.

The next thing that I want to look at is the most deserving of the wizkid fan badge in this period, I'll be checking the number of tweets they made as well as the sentiments.

There are two people with 35 tweets on this Saturday alone, but uptownguy came at the top for a reason, so let's check him out

Wordcloud for uptownguys posts, (he also talks a lot about davido, we're gonna overlook that sha), apart from that, every other word seems positive

This is impressive really, uptownguy has 21 posts positive, which is about 60% of all his posts, that's cool uptownguy 🎉😂

Let's just check for some info about uptown guy, to do this we're going to query the users sheet.

The query shows he has 122 followers, he's a true wizkid fan and he's also an arsenal fan. I don't know if this is true generally, but I usually see a lot of wizkid fans that are also messi fans, let's run a quick check on that.

Ahhhhh. My guess was wrong after all, wizkid has more mutual fans with ronaldo than with messi, it's still good to see that some part of the majority are messi fans.

Another insight here is that wizkid fans also like burna, that's cool, two grammy award winners 😂.

So, we have finally come to the end of this. It was worthwile and I made some pretty interesting discoveries. Cheers 🎉🎉